IN THE CLAIMS: 



Please amend the claims as listed below. This listing of claims will replace all prior 
versions, and listings, of claims in the application. 

1-21. (Canceled) 

22. (Original) An apparatus comprising: 

a first circuit to receive a first packed data comprising at least four byte data 
elements 

a second circuit to receive a second packed data comprising at least four byte data 
elements; 

a decoder to decode a plurality of instructions including a first instruction and a 
second instruction, the first instruction comprising a first source field indicating a first 
location to access said first packed data, and a second source field indicating a second 
location to access said second packed data; 

a multiply-adder circuit, enabled by the decoded first instruction, to multiply each 
of a first pair of byte data elements of the first packed data with respective byte data 
elements of the second packed data and to generate a first 16-bit result representing a first 
sum of products of the first pair of multiplications, and to multiply each of a second pair 
of byte data elements of the first packed data with respective byte data elements of the 
second packed data and to generate a second 16-bit result representing a second sum of 
products of the second pair of multiplications; 

a third circuit to store a third packed data comprising at least said first and second 
16-bit results in response to the first instruction; 

an adder circuit, enabled by the decoded second instruction, to add said first and 
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second 16-bit results of the third packed data to generate a third 16-bit result representing 
a third sum of products of the first and second pairs of multiplications; and 

a fourth circuit to store a fourth packed data comprising at least said third 16-bit 
result in response to the second instruction. 

23. (Original) The apparatus of claim 22 wherein said first and second packed data each 
contain at least eight byte data elements. 

24. (Original) The apparatus of claim 22 wherein said first and second packed data each 
contain at least sixteen byte data elements. 

25. (Original) The apparatus of claim 22 wherein the first packed data comprises 
unsigned byte data elements. 

26. (Original) The apparatus of claim 22 wherein the second packed data comprises 
signed byte data elements. 

27. (Original) The apparatus of claim 26 wherein the first packed data comprises 
unsigned byte data elements. 

28. (Original) The apparatus of claim 27 wherein the first and second 16-bit results are 
generated using signed saturation. 

29. (Original) The apparatus of claim 22 wherein the multiply- adder circuit comprises a 
first and a second 16. times. 16 multiplier to perform the first and the second pair of 
multiplications respectively. 
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30. (Original) A computing system comprising: 

an addressable memory to store data; 
a processor including: 

a first storage area to store M packed unsigned byte data elements; 

a second storage area to store M packed signed byte data elements; 

a decoder to decode a first instruction comprising a first opcode field 
having a hexadecimal value of 0F38, a second opcode field having a hexadecimal value 
of 04, a first source field indicating said first storage area, and a second source field 
indicating said second storage area; 

an execution unit, responsive to the decoder decoding a first instruction, to 
produce M products of multiplication of the packed byte data elements stored in the first 
storage area by corresponding packed byte data elements stored in the second storage 
area, and to sum the M products of multiplication pairwise to produce M/2 results 
representing M/2 sums of products; and 

a third storage area to store M/2 packed 16-bit data elements, the third 
storage area corresponding to a destination specified by the first instruction to store the 
M/2 results; and 

a magnetic storage device to store said first instruction. 

31. (Original) The computing system of claim 30 wherein M is 16. 

32. (Original) The computing system of claim 30 wherein M is 8. 

33. (Previously Presented) The computing system of claim 32 wherein each of said 
M/2 16-bit results are generated using signed saturation. 



4 



